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Abstract—An_ efficient algorithm for fast field 
evaluation, namely multilevel Green’s function 
interpolation method (MLGFIM), has been successfully 
applied to solve electromagnetic problems for both low 
frequency and full-wave. Further refinements, including 
new interpolation scheme, QR factorization, and FFT 
technique, have been applied to accelerate this algorithm 
and extend the problem scale that MLGFIM can solve. In 
this paper, we present a review of different techniques in 
MLGFIM for the fast evaluations of large-scale 
electromagnetic problems. The applications of simulating 
periodic structures and layered mediums have been 
accelerated by MLGFIM as well. Some numerical 
examples are given in this paper to demonstrate the 
accuracy and efficiency of this algorithm. 


Index Terms—Method of Moments (MoM), Green’s 
Function Interpolation, Fast Solver, Parallel Computing, 
FFT. 


I. INTRODUCTION 


Calculation of antenna radiation pattern, analysis of radar 
cross section (RCS), and extraction of circuit parameters are 
often performed using integral equation-based computational 
techniques, e.g., the method of moment (MoM) [1]. However, 
the computational requirements for MoM are very high, and 
various fast numerical methods have been proposed to 
accelerate the computation, for instance, multilevel fast 
multipole algorithm (MLFMA) [2]-[4], pre-corrected fast 
Fourier transform (FFT) [5],[6], adaptive integral method 
(AIM) [7],[8], and sparse matrix canonical grid (SMCG) 
[9],[10]. The first method uses multipole expansion to 
approximate the far field interactions, while the later three 
methods apply FFT for fast field evaluation. Although these 
methods calculate the matrix-vector multiplication indirectly 
and thus significantly reduce the computational complexity, 
each of the methods has its own weaknesses. The FFT-based 


methods are most suitable for densely-packed structures, in 
which the unknowns are associated with most of the grid 
points. The PFFI approach employs the _ bi-linear 
interpolation method, and it is not suitable for analysis of 
full-wave electromagnetic problems as bi-linear interpolation 
is not accurate for the rapidly-varying kernels in full-wave 
applications. On the other hand, the multipole-based 
algorithms depend on the type of Green’s function used for 
the specific problems and they become cumbersome when 
dealing with structures embedded in a multilayered medium. 
When the simulated structure changes, for instance, insert an 
additional layer of dielectric substrate, the problem has to be 
reformulated. 


Some interpolation-based fast algorithms have the potential 
to address many of the aforementioned weaknesses. In [11], 
Brandt developed a multilevel interpolation approach, in 
which a softened kernel and phase compensation is employed 
when the kernel is oscillatory. Non-uniform grid (NG) [12], 
[13] and multilevel NG [14],[15] algorithms, as kernel 
independent algorithms, are proposed for fast field evaluation. 
With the technique of phase and amplitude compensations, 
NG-based algorithms sample the compensated field using a 
non-uniform polar grid. The NG and multilevel NG methods 
achieve computational complexities of O(N*”) and O(NlogN), 
respectively. Adaptive cross approximation (ACA) [16],[17] 
employs interpolation approach directly on low-rank 
sub-matrices denoting the far field interaction, and thus this 
method is not difficult to implement. The H? matrix-based 
method is proposed in [18],[19]. It applies multilevel 
interpolation with Lagrange interpolation function and Tartan 
grid. 

Another kernel-independent interpolation algorithm, viz. 
multilevel Green’s function interpolation method (MLGFIM), 
has been developed to solve quasi-static [20]-[22] and 
full-wave electromagnetic problems  [23],[24]. The 
application of an octa-tree structure and non-requisite of a 
softened kernel distinguish the MLGFIM with the algorithm 
in [11]. For full-wave problems, instead of using Lagrange 
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interpolation as in the quasi-static problems, radial basis 
function (RBF) interpolation is employed. Lagrange 
interpolation method can not be directly applied for the 
approximation of the full-wave Green’s function, unless 
phase compensation technique is incorporated [25]. The RBF 
interpolation has excellent approximation properties [26], and 
we find that if the object is not extremely large, the full-wave 
Green’s function can be interpolated without the 
compensation technique, which is different from the 
NG-based algorithm. Although ACA is flexible for solving 
problems with a complex medium, it is difficult to enhance 
the efficiency by using the translational invariance 
characteristics of the Green’s functions in some problems, e.g. 
the multilayered planar problem. In addition, the multilevel 
structure of MLGFIM also differs from ACA. Compared with 
the H? matrix-based method, the applications of RBF 
interpolation and staggered Tartan grid (STG) make the 
number of interpolation points in MLGFIM increases more 
slowly. This is because the interpolation points at the coarser 
levels do not coincide with those at the finer levels in the 
MLGFIM, viz., the volumetric density of interpolation point 
at the coarser levels is lower than that at the finer levels. 
Moreover, some novel techniques used in MLGFIM, e.g. QR 
factorization [27] and hybrid quasi-2D/3D multilevel 
partitioning approach [24], distinguish itself from other 
interpolation-based methods. In addition, the applications of 
peer-level and lower-to-upper-level interpolations make the 
MLGFIM very efficient for the analysis of large-scale 
problems, and the computational complexity of 3-D full-wave 
MLGFIM is O(NlogN) [23]. 


Although so many fast algorithms have been developed, 
there are still many challenges in the electromagnetic 
simulations. For instance, electromagnetic enhancement of 
surface enhanced Raman scattering (SERS) attributed to the 
enhanced E-fields in the vicinity of plasmonic nanostructures 
is the major approach of SERS enhancement [28],[29]. When 
the exciting electromagnetic radiation approaches the 
resonance frequency, the enhancement factor which is 
proportional to the fourth power of the gain in the 
electromagnetic field becomes very difficult to evaluate. To 
accurately calculate the electromagnetic field, the 
nanoparticles must use deep sub-wavelength discretization. 
Due to interpolation-based algorithm, MLGFIM is capable of 
dealing with the low-frequency and multibody structure. In 
addition, some thin strip with high dielectric constant has been 
used to promote the magnetic emission in quantum emitters 
[30],[31], and we found when the emitter is close to the 
scatters, the near-field is difficult to be accurately computed 
as well. The high-contrast material makes the whole structure 
extremely in-homogenous, and thus deep sub-wavelength 
mesh is also required. Moreover, the analysis of metasurfaces, 
e.g. holographic metasurfaces [32],[33] which excites surface 
waves with the desirable field distributions, is challenging. 
For many of the holographic structures, the size and shape of 
each sub-scatter which stuck on a substrate may be different, 
so the computational burden is very large. This problem can 
be released by adopting multilayered medium Green’s 
function, and MLGFIM is able to be implemented based on 
this complex Green’s function. 


In this paper, various techniques for the MLGFIM to solve 
electrically large electromagnetic problems are reviewed. The 
MLGFIM used for accelerating the matrix-vector 
multiplication is introduced first. QR factorization technique 
is employed to compress the matrix storage. In order to make 
the MLGFIM more efficient, different radial basis functions 
(RBFs) are compared and interpolation grid patterns are 
designed so that the MLGFIM could be accelerated by 
applying newly proposed interpolation schemes. Moreover, 
the technique of fast Fourier transform (FFT), which has been 
combined with MLFMA [34],[35], could also be applied to 
reduce the translation time in MLGFIM. The parallelized 
MLGFIM versions based on OpenMP and message passing 
interface (MPI) techniques have been realized to make this 
fast solver more efficient for the large-scale electromagnetic 
problems. Since the MLGFIM is a kernel-independent 
method, it is easy to use MLGFIM for the analysis of periodic 
structures and multilayer problems. Numerical results 
demonstrate the good accuracy and efficiency of the above 
implementations of the MLGFIM. 


II. DESCRIPTION OF MLGFIM 


For the 3-D free-space aperiodic problems, it is always 
described by electric-field integral equation (EFIE), 
magnetic-field integral equation (MFIE) and combined-field 
integral equation (CFIE). With boundary conditions imposed 
on the surface of the body, another integral equation, 1.e. 
Poggio-Miller-Chang-Harrington-Wu-Tsai (PMCHWT) 
integral equations, could be used to analyze 3-D composite 
metallic and dielectric bodies with surface discretization. 
Using Galerkin method, the aforementioned integral 
equations are converted to NxN dense matrix equations, viz., 


Ax=b (1) 
where * can represent EFIE, MFIE, CFIE or PMCHWT. For 
simplicity and clarity, we use EFIE as an example. The entries 


in NxN matrix AP and Nx1 vector b ™ 


A, =| ds] asti œj’ 


are written as: 





(2) 
= V-j,@)V'-j,@)1G@.r’) 
O ME, 
and 
b mya) | dsj.(r)-E (r) (3) 
QOH) si 


where r,r’, @, &, Lo, J), G(r,r’) and E” (r) are the 
observation point, source point, angular frequency, free space 
permittivity and permeability, basis function, interaction 
function, and incident electric field, respectively. 


To apply MLGFIM, (2) is decomposed into scalar form as: 
A,=A*+A?+Ai—A® (4) 
ij 1J 1J 1J J 
where 


A; =| ds] as} 0I ENGE, r’) 5 


#= X,Y OF, Z 
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and 


d _ 
A; = 2 
O MoEo 


Each scalar part A, in (5) and (6) has the same form as: 


A, = I, ds f ds'r,(r)¢(r")G(r,r") 


| ds | ds'(V-j,@)V'-j,@)1G@.r’) © 


(7) 


where 7,(r) and ap (r^) are related to the weighting function 


and the basis function, respectively. 

















(b) 


Fig. 1. (a) 2-D pictorial representation of 3-D multilevel division in which 
the interaction list of box n> at level 2 are marked with m, and the interaction 
list of box n; at level 3 are marked with m3. (b) 2-D pictorial representation of 
3-D locations of the interpolation points (white circles) and interpolated 
points (black circles) are in the boxes m and np. 


In [24] and [36], it is proven that PMCHWT and CFIE 
which are associated with both EFIE and MFIE can also be 
decomposed into the same form as (7). The interaction 
functions G(r,r’) corresponding to the EFIE and MFIE can 


be expressed as: 


(ik\r—r'|) 
ræ (for EFIE) 
r-r 
GT,r)=] w (8) 
Ze r-r']) , 
TE (ik Ir -r 
F-r' 
For the far-field interactions, instead of directly calculating 
the function Gcr,r’), MLGFIM use interpolation method in a 





—1)(for MFIE) 


multilevel tree to quickly approximate this function. The 
multilevel tree is constructed by enclosing the object into a 
box (level 0) and recursively dividing the large box into 
sub-boxes. Box m is the neighbor/interaction list of box n; at 
level / if m; is near/well separated from n;, and their parents are 
neighbors of each other, as shown in Fig. 1(a). If box mis the 
interaction list of box n; at level /, the interaction function 
between these two boxes could be obtained by interpolation 
method, as shown in Fig. 1(b), viz.: 


K, K, 
G(r.r')=Y Yo, ,(r), (G(r, r) 


i=l j=l 


= 6) (r)G,,,,,@, (r’) 


wherer and r’ ; are the i-th and j-th interpolation points in 
m,i nį, 


(9) 


field box m and source box nı, O, (r) and O, ; (r') are the i-th 


and j-th interpolation functions, respectively. K; is the number 
of interpolation points at level J. 


Moreover, using the interpolation method, G and 
Miz] 
G can also be interpolated using G_ [20], viz.: 
Mı M44 m Nj 
G mar = C maim, G m;n (10) 


= =T 
G mima 7 G m:n C mina 


where m, cm cn,, and matrix Cmm iS Named as 


1? Niy 
lower-to-upper-level interpolation matrix. If the box mı is 
interaction list of box n; are at level l, G(r,r'^) at the finest 
level L can be obtained using (9) and (10), viz.: 


=T: 


G(r, r’) = Dn. (r) > Ci : T , C mina : O, (r’) (1 1) 
= = == = =T 
= ©, (r) ° Cm m, E E ° Gmn ; Cin: ln in, $ ©, (r’) 
Substituting (11) into (7) gives: 
~ = = = =T =T 
A, =I | BE O)O, I Cacaso atm Omen C wma’ wim 
i (12) 


If, as'6,(r')0,, Œ] 


= = = = 


SPD) aa min Cn aa ea 2. 0) 

It should be mentioned that if box mz is the interaction list 
of box nz at level L, there is no lower-to-upper-level 
interpolation matrix in (12), and it becomes the conventional 
peer-level interpolation. Subsequently, the sub-matrix 
A can be rewritten as: 


myn, 


Taa) 


fot) [eo eo 


«eC ma:mm i G m:n, ` C nina Cn 


>:| 
| 


;n 


m- m. L-7 


Mng L?’ L-1 


PT 
TiN (Ty, ) 


E D (13) 
zai Za 


vA ’ 
Zn, Nnr (Ty, )| 


= = =p = 


= = = T 
= Tan, Cm, m, +C mm Gmi Cry stay Cn, ain, -Z 


n, 
where N P and N, denote the number of unknowns in boxes 
E L 


mı and nz, respectively. From (13), it is observed that only the 
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interpolation matrixes T, and Z, at level L should be 


calculated. Assuming there are Pz boxes at level L, the 
MLGFIM algorithm can be finally shown as Fig. 2. 
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Fig. 2. Flow chart of the implementation of MLGFIM. 


Considering the interaction function matrix Gm ;n 


represents interactions between the interpolation points of two 
well separated boxes, it is in low rank and it could be 
factorized with QR factorization technique as: 


= Q m, n Rasin (14) 


G m;n 
where Q, p and Rm:„ are the K;x r; unitary matrix and the r; 
P 


x Kı upper triangular matrix. r; is the numerical rank of Gn, as 
and the condition 7 << Kı always holds. Instead of directly 
store the Green’s function matrix Gan which is a Kı x Kı 


full-matrix, we use (14) to compress the Green’s function 
matrix and it also brings a great advantages in the computation 
of step 3 and 4 of Fig. 2 [23]. 


Ul. THE INTERPOLATION SCHEME FOR MLGFIM 


The accuracy and efficiency of MLGFIM tightly depend on 
the interpolation scheme for the approximation of interaction 
function. RBF interpolation method has been proven to be an 
appropriate approach for the interpolation of interaction 
function [23]. A lot of efforts have been made on searching a 
better RBF interpolation scheme to improve the efficiency of 
MLGFIM. Generally, an RBF p, (R) = g(r -r l) is a 


continuous function that depends only on the distance 
between the data (interpolated) point r and the center 
(interpolation) point r; Considering an approximation 
function f (r) in an influence domain that has a set of K 


arbitrarily distributed nodes with corresponding 
values { f (r, JA _» the RBF interpolation reads: 
fr)=[9(R) p (R) Pk(R)| © as) 


Lf) fa) = call 
the entries 
P, =ọ(|r, -r, ). 


There are many types of RBFs that can be used as the 
interpolation function. Table 1 shows three types of RBFs that 
have been used in MLGFIM. Parameter c in these expressions 


where of matrix D 


are expres sed as 





is named as the shape parameter, and it determines the shape 
of the interpolation function. In [37], it has proven that GA 
RBFs can obtain better interpolation accuracy than IMQ 
RBFs which have been used in [23] with the same number of 
interpolation points, because they have better convergence 
behavior. As a new class of oscillatory RBFs, BE RBFs have 
shown their good interpolation performance in [38], and have 
been applied to further enhance the efficiency of MLGFIM as 
well [39]. 


Table 1. Definitions of Some Typical Radial Basis 
Functions 


Infinitely smooth functions 


Inverse multiquadric (IMQ) 


Gaussian (GA) exp(—c R?) 
TE (cR) 
Bessel (BE) (R v=l,2, 





grid element { 


















































(c) 

Fig. 3. 2-D pictorial representation of (a) original STG, (b) modified STG 
and (c) Boundary clustered STG (square points, hollow and solid dots are 
volume center, surface center and vertex points in a grid element, 
respectively). 


Besides, the distribution of interpolation grid also affects 
the interpolation efficiency of Green’s function. Tartan grid, 
as a conventional grid pattern which is always used in the 
Lagrange interpolation, just distributes interpolation points 
evenly spaced in the source/field boxes, so the total number of 
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interpolation points is K = Kip°. Since the RBF method allows 
us to use scattered point distribution, STG which outperforms 
Tartan grid in accuracy is used in MLGFIM. Three types of 
STGs have been applied as the grid patterns which are shown 
in Fig. 3. The original STG (Fig. 3(a)) consists of Tartan grids 
and the points in the volume center of grid element of which 
vertexes are formed by adjacent Tartan grids [23]. And the 
modified STG, as shown in Fig. 3(b), consists of Tartan grids 
and the points in the center of the sides of the grid element 
[37]. Hence the total number of interpolation points for the 
original STG and modified STG are K,p°+(Kip-1)° and Kip 
343K p(Kip-1)*, respectively. It has been proven that with 
similar number of interpolation points, the results using the 
modified STG are more accurate than those using the original 


STG [37]. This 1s because the condition number of matrix D 
generated by the modified STG is better than the original 
STG. However, it is pointed out that an accurate method on 
high-degree uniform grid sets leads to large errors near the 
boundaries [40],[41]. So if the boundary errors can be 
suppressed, the interpolation accuracy will be further 
enhanced. Boundary clustered STG is generated by making 
the points denser near the boundary [42]. In 1-D, the 
Chebyshev-Lobatto nodes in the closed interval [-1, 1] are 
expressed as: 


| 2 |Z esot (16) 
K,,-1} 2 





where hs is the location of s-th interpolation point. However, it 
has been proven that the 3-D non-uniform grid generated by 
simply clustering the interpolation points toward boundary 
using (16) in x, y and z directions cannot improve the 
interpolation accuracy [42]. This is because this grid helps to 
reduce the boundary interpolation errors but it induces large 
interior errors as well. In order to balance the boundary and 
interior errors and make interpolation points still symmetric to 
the center of interval, the 1-D grid pattern in the closed 
interval [0, /] is generated as: 


(17) 





ô 
L 1—cos 2s mae s=0, Seat 
2 Kp-1j} 2 2 
ô 
2s—2-| (Kip +1)/2 
£ 1+cos 25—2-| (Kp +))/2| ea =| Eet ky 
2 K,,—1 2 2 


In function (17), the parameter 61s introduced to adjust the 
degree of boundary clustering. The interpolation points, as ó 
approaches 0, are distributed at //2. Increasing 6 will shift the 
interpolation points toward the boundaries, and it improves 
the resolution at the boundary but lowers the resolution in the 
interior box. By adopting function (17) in x, y and z directions, 
we obtain a boundary clustered Tartan grid with Kip” 
interpolation points. Then, the boundary clustered STG is 
generated by adding 3Kip(Kip-1)* interpolation points to the 
center of the sides of the rectangular block (grid element), as 
shown in Fig. 3(c). 


The maximum interpolation errors using three different 
interpolation schemes are compared in Table 2. The first two 
schemes employ the same interpolation grid (viz., original 
STG), and the threshold of the maximum error for the second 
scheme is set as 0.05. With the same number of interpolation 
points, the maximum errors using IMQ RBFs are always 
larger than those using BE RBFs, as shown in Table 2. And as 
the edge length of the box increases, the method using BE 
RBFs becomes more accurate than IMQ RBFs. The third 
scheme adopts a new interpolation grid pattern, viz., boundary 
clustered STG, to further enhance the accuracy and efficiency 
of the interpolation. With the same error threshold, the 
method using boundary clustered STG requires less number 
of interpolation points than the original STG. Moreover, the 
interpolation accuracy using the third scheme, even with less 
number of points, is better than the other two. And it is also 
observed that if we employ the third scheme, much more 
number of interpolation points is saved as the size of box 
increases. The efficiency of MLGFIM can be dramatically 
enhanced by adopting the third method. 


Table 2. Comparison of accuracy using different kinds of interpolation schemes. 


l Number of 
Interpolation Box enoo 
scheme length (A) n tS 
1 189 
and original 

STG 3 1241 
4 2331 

1 189 

BE RBFs and 2 559 
original STG 3 1241 
4 2331 

1 172 

BE RBFs and 2 365 

boundary 

clustered STG 3 666 

4 1099 


Optimized 


Optimized O iape Maximum 
in (17) parameter c error 
2.10 0.0068 1 
0.80 0.0731 
0.69 0.131 
0.51 0.296 
6.00 0.00632 
3.20 0.0174 
2.35 0.0295 
1.705 0.0498 
0.75 6.90 0.00439 
0.97 3.21 0.0125 
0.96 2.10 0.0291 
0.83 1.60 0.0465 
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IV. MILGFIM FOR PERIODIC STRUCTURES AND 
MULTILAYERED PROBLEMS 


As a kernel independent algorithm, MLGFIM can also be 
used for the fast analysis of other complex applications, e.g., 
periodic structures and multilayer problems. The Floquet 
theorem allows limiting the analysis of periodic structures to 
that of a unit cell; however, the number of unknowns can still 
be large with there are fine features exist in the unit cell. 


For periodic structures, instead of interpolating free-space 
Green’s function, we should interpolate periodic Green’s 
function. Ewald’s transformation [43],[44], which 
efficiently combines both spatial and spectral formulations 
of the periodic Green’s function, is applied to speed up the 
evaluation of periodic Green’s function. Besides, since the 
unit cells are actually connected together, the periodic 
boundary condition must be taken into consideration. Half 
basis functions at the connection boundary between two 
adjacent unit cells have to be introduced to ensure the 
boundary currents continuous. 


After that, we can use MLGFIM to accelerate the analysis 
of periodic structures. There are some modifications in the 
definition of neighbor and interaction list when applying 
MLGFIM for these structures, and they are defined as [45]: 
box m is the neighbor/interaction list of box n when box m is 
near/well separated from box n and the images of box n, and 
parent box of box m is also near parent boxes of box n and 
images of box n simultaneously. Fig. 4 shows the neighbor 
and interaction lists of box n at level 3. In order to implement 
MLGFIM easier, the box of interaction list which is not 
around box n (but around the image of box n), e.g. box mı in 
Fig. 4, should be moved to the place of box m2 (image of m1) 
which is around box n. 


E 
OZAN 
x qT 


Ga. Tl 





oOo Pa 


Fig. 4. 2-D representation of a 3-D multilevel division in which the 
neighbor (white colored) and interaction (slash) lists of box n at level 3 are 
shown. 


In order to address the problem of the value of optimum 
shape parameter c shift during the interpolation, RBF-QR 
method is applied for the interpolation of periodic Green’s 
function [46]. As the periodicity of one structure changes, 
the interpolated function (viz., periodic Green’s function) 
changes, and consequently the optimum value of shape 
parameter c in RBF should be retested. Because the RBF-QR 
method is insensitive to the value of c, it 1s not required to 


retest the shape parameter when the interpolated function 
changes. By using expansion with Chebyshev polynomials 
and QR-factorization to the Gaussian RBFs, a better 
conditioned interpolation matrix with its condition number 
insensitive to c is generated [46],[47]. With RBF-QR 
interpolation, MLGFIM becomes more robust for the 
analysis of periodic structures. 


By interpolating the multilayered Green’s function, 
MLGFIM has been applied to solve the problems of 
multilayered medium structures. An improved discrete 
complex image method (DCIM) which uses a new sampling 
path directly goes through the branch point is adopted for the 
evaluation of the multilayered Green’s function [48]. This 
method helps to capture the contribution to the lateral wave 
more accurately. The property of planar structures of 
multilayered problems demands only 2-D interpolation 
scheme (viz., the interpolation points are only distributed at 
the interfaces of every two layers), while 1-D interpolation 
scheme is used for the vertical via holes only at the finest 
level [49]. We will later show that this interpolation 
approach is very efficient for the analysis of multilayered 
medium problems. 


V. OTHER APPROACHES TO COMBINE WITH MLGFIM 


Given that the translation occupies most of the 
computation time in the interpolation procedure of 
lower-to-upper-level, it is possible to further enhance the 
efficiency of MLGFIM by reducing the translation time. 
Although the interpolation performance of 3-D Tartan grid is 
not as good as the STG-based grid patterns, it generates a 
3-D Toeplitz translation matrix G which could use FFT to 
accelerate the matrix-vector multiplication. Moreover, the 
difference in the interpolation efficiency between 2-D 
Tartan grid and STG-based grid patterns is smaller than that 
of the 3-D case, and hence the MLGFIM-FFT can be used to 
accelerate the solution, especially the solution of multilayer 
problems where 2-D interpolation scheme is adopted. 


For the multilevel tree structure, there are two ways to 
implement the FFT-accelerated translation procedure. In 
[SO],[51], the multilevel tree will be truncated at certain 
level, and the pre-corrected FFT approach is applied at this 
level. A global Toeplitz matrix is generated and the 
interactions among all boxes are efficiently calculated by 
performing only one FFT. However, a pre-corrected near 
field evaluation is required for this method. So the 
translation (also shown in Fig. 2) is rewritten as a 
convolution form, viz.: 

Bn = FFF '\ FFT (G)-FFT(S) = Gin Sn, (18) 


n eneighbors of m; 


where G is comprised of Green’s function matrix Gm :n 


and § consists of vector S,,. Below the truncated level, the 
lower-to-upper-level interpolation as shown in Fig. 2 could 
be carried out as well [50]. 

The second way is based on local boxes, so it generates a 
sub-domain Toeplitz matrix. Consequently, sub-domain 
FFT is applied to accelerate the translation. Pre-correction is 
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not required because the sub-domain Toeplitz Green’s 
function matrix is formed by two well separated boxes [52]. 
Hence, instead of (18), the translation could be expressed as: 


Bm, = CT mii Sn, 


n €interaction list of m; 


= FFF >. 


n Einteraction list of m, 


(19) 


FFT (Gmn FFT (S», | 


where Gm:„ is a vector extracted from the Toeplitz matrix 


Gn „ - Lhe FFT is not implemented only at one level, but at 


many high levels unless the number of interpolation points is 
too small and direct matrix-vector multiplication will be 
more efficient than the FFT. With the second method, no 
truncated level exists, so the lower-to-upper-level 
interpolation is implemented from the top level to the bottom 
level to make the MLGFIM-FFT more efficient. 


In addition, parallel computing could also be applied to 
accelerate the solution. Parallelized MLGFIM using OpenMP 
[50] and MPI [36] have been developed. For achieving high 
efficiency, we must make workload balanced and that could 
be accomplished by assigning approximately the same 
number of boxes at each level into each processor. Different 
from the OpenMP parallelization on a _ share-memory 
computer system, the MPI parallelization on a 
distributed-memory computer must take into account of the 
communications among different processors as well. If the 
upward or downward pass shown in Fig. 2 is implemented 
through different processors, a large amount of data 
communication is required. Hence, each root box (at level 2) 
and all its descendants should be assigned to the same 
processor [36]. Besides, in order to reduce the data 
communication in the computation of near field, the 
neighbors of each finest-level box are duplicated if they are 
not assigned to the same processor. 


Parallelized MLGFIM on a distributed-memory computer 


system can be developed by modifying the steps of the 
original MLGFIM. First, the matrix-vector multiplication 


c= Zn, Xn, is calculated at the finest level / = L with its own 


sub-tasks. Second, the vectors ey Cun. Sn, for each 
box assigned to this processor are computed from level / = L - 
1 to 2. Then the translation from the source box to the 
interaction list is computed. Data communication is required 
in this step, and we use MPI Allreduce command to collect 
the vectors Sy, from each processor, and distribute it to all 


After that, the vectors 
G. 5. +c... p, in the corresponding 


processors. 


Bn, = 


n,€interaction list of m, 


boxes for the processor from level / = 3 to L are computed. 
Finally, the matrix-vector multiplication b, =b + Tn, Bn, 1S 
performed and the vector 7, is also collected and distributed 


to all processors. 


Certain factors should be considered when examining the 
efficiency of parallel computing. Intuitively, a parallel 
program leads to a shorter execution time for a given 


computation task, which can be expressed by a speedup and 
efficiency factors as: 





s(n,)=t,/t, (20) 
and 
t 
e(n, )=——x 100% (21) 
n, XT, 


where t, and t, are serial execution time using one processor 
and parallel execution time using np processors, respectively. 
If the efficiency factor approaches 100%, this indicates that 
the parallel computing is very efficient and the 
communication/computation ratio is very small. 


VI. NUMERICAL EXAMPLES 


In order to demonstrate good performance of the proposed 
method, some examples are given as follows. The generalized 
minimal residual (GMRES) iteration method with a relative 
error norm of 10° is employed, and the inner loop of GMRES 
contains 100 matrix-vector multiplications. 











0 | 20 40 60- 80 100 120 140 160 180 
0 (degree) 
(c) 
Fig. 5. Plane wave impinges normally on a fishnet-type array. (a) 
Geometry. (b) Unit cell. (c) Bistatic RCS with 06 polarization. 


In the first example, MLGFIM is applied to plane wave 
scattering from a fishnet-type array with 10x10 elements, as 
shown in Fig. 5(a). The geometrical parameters of the array 
are as follows: w1 = 10 um, w2= 17.5 um, d= 115 um, t=9 um. 
The patterns on the top and bottom surfaces of the substrate 
are made of the same shaped PEC, and the relative 
permittivity of the substrate is 2.25. A plane wave with 
electric field parallel to the x-axis propagates along —z-axis, 
and the center frequency is 0.5 THz. PMCHWT integral 
equation is used for this structure to obtain the surface current. 
The bistatic RCS with 00 polarization is calculated using the 
5-level MLGFIM, and compared with MoM, as shown in Fig. 
5(b). From this figure, it is observed that these two results 
agree very well. 
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Fig. 6. A dipole in front of an 11x11 dielectric sphere array. (a) Geometry. (b) Polar plot of the directivity versus ø in dBi when = 90°. (c) Distribution of 


Poynting vector in yz plane. 


Table 3. Comparison of CPU Time and Memory Requirement for Simulating the 11x11 Dielectric Sphere Array 


Maximum The 
Method electrical number of 
length(2a) levels 
Original MLGFIM 
16.87 6 
Improved MLGFIM 


In the second example, a radiation problem that a dipole in 
front of an 11x11 dielectric sphere array (g = 2.25) is 
investigated, as shown in Fig. 6(a). The radius of the sphere is 
2 cm and the distance between the centers of two adjacent 
spheres is 5 cm. A Hertzian dipole which works at the 
frequency of 6.25 GHz is located 10 cm in front of the array 
center and placed along the z-axis. In this example, ten grids 
per wavelength A, (where Ay =A / ae. ) are used to discretize 


this structure, so that the number of unknowns for equivalent 
electric and magnetic currents is 144,020. The polar plot of 





The number CPU time for per 
; Memory 
of matrix—vector PEOR E 
unknowns multiplication q 
96.3 s 8.7 GB 
144,020 
21.4s 3.1 GB 


the directivity versus @when@= 90° is shown in Fig. 6(b). 
Two methods, 1.e., original MLGFIM (IMQ RBF and original 
STG) and improved MLGFIM (BE RBF and boundary 
clustered STG), are used for the simulation. Good agreement 
can be observed between these two results. The distribution of 
Poynting vector in the plane of x = —20 cm is also calculated, 
as shown in Fig. 6(c). The computational time and memory 
requirement for this example are shown in Table 3. Compared 
with the original method, only 22.2% CPU time and 35.6% 
memory are required for the improved one. 
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Fig. 7. Plane wave scattering from a dielectric prolate spheroid. (a) 


Geometry. (b) Bistatic RCS with ¢¢@ polarization at 300 MHz. (c) CPU time. 
(d) Memory requirement. 


Next, an example is provided to show the CPU time and 
memory requirement with different number of unknowns. A 
dielectric prolate spheroid is considered, as shown in Fig. 
7(a). The long axis of the prolate spheroid is 6 m, whereas the 
short axis is 1.5 m. A plane wave with a frequency of 300 MHz 
impinges on the prolate spheroid along —x-axis. The plane 
wave scattered by prolate spheroids composed of double 
positive (£= 2.25, u = 1.0), single negative (€= —2.25, = 
1.0) and double negative (s= —2.25, u= —1.0) medium are 
studied. Fig. 7(b) compares the bistatic RCS with 
ġġ polarization for these three cases. The forward scattering is 
enhanced significantly due to negative permittivity and 
permeability. The CPU time for performing each 
matrix-vector multiplication and the corresponding memory 
requirement are shown in Figs. 7(c) and (d). In these two 
figures, about 20 grids per wavelength are used to discretize 
the prolate spheroid, and the frequency varies from 150 MHz 
to 660 MHz. Three types of numbers of levels are used for the 
calculation of the structure, and the original MLGFIM cannot 
simulate the structure when the number of unknowns is larger 
than 5x10° because of the memory limitation. From Fig. 7(c) 
and (d), it is observed that the CPU time and memory 
requirement of the improved MLGFIM are always smaller 
than those of the original one. And when the number of 
unknowns increases to 1,089,224, the CPU time for 
performing one matrix-vector multiplication and the memory 
requirement for the improved MLGFIM are 128.7 seconds 
and 17.68 GB, respectively. In addition, it is also observed 
that the improved MLGFIM approximately obeys a 
computational complexity of O(N log N) and memory 
complexity of O(N). 
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Fig. 8. Plane wave scattering from a metallic scatterer. (a) Geometry. (b) 
Unit. (c) Bistatic RCS with 00 polarization. 


Table 4. Comparison of CPU Time and Memory 


Number Number of 
of levels processors 
1 
2 
7 
4 
8 


Requirement 
Speed-up E am 
1.0 100 0 
1.98 99.2 49.6 
3.66 91.5 72.4 
6.51 81.3 84.6 


To show the effect and accuracy of parallelized MLGFIM, 
we consider the plane wave scattering from a complicated 
metallic scatterer, as shown in Fig. 8(a). This structure is 
composed of orthogonal square rings, and four square rings 
orthogonally connect with one ring to generate a unit, as 
shown in Fig. 8(b). The outer length, inner length and height 
of each ring are 3 mm, 1.8 mm and 0.6 mm, respectively. On 
the boundary of this structure, there exists only half rings, and 
hence the total structure contains 6 orthogonal rings in each 
direction. The structure is investigated at 300 GHz and 
illuminated by a plane wave propagating along the 
—x-direction, with the electric field polarized in_ the 
z-direction. After the structure 1s discretized, the total number 
of unknowns obtained is 1,134,768. The bistatic RCS result 
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calculated by the serial MLGFIM and parallelized MLGFIM 
using 8 processors are compared in Fig. 8(c). Good agreement 
is observed from this figure. Furthermore, the speed-up factor 
and efficiency factor are demonstrated, as shown in Table 4. 
According to this table, we observe that the computational 
time is reduced significantly with the increase of the number 
of processors. The efficiencies of the parallelized MLGFIM 
are larger than 80% for all cases. The CPU time is reduced by 
84.6% after calculating this problem with eight processors, 


simultaneously. 
&=3.0 0.027 mm 


4.83 mm 
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(b) 

Fig. 9. FSS comprising two centered square loops on a dielectric substrate. 
(a) Side and front view of the unit cell. (b) TM reflection coefficient with 
different values of d. 

In the fifth example, MLGFIM is used to analyze a periodic 
structure, in which two centered square loops are etched on a 
dielectric layer (€= 3). Fig. 9(a) shows the geometry of the 
frequency selective surface (FSS), and it is illuminated by a 
uniform plane wave incident from the normal direction. 
Periodic Green’s function interpolation and periodic 
boundary conditions are used for this structure. The results for 
different dimensions d of the inner loop are calculated by 
MLGFIM and compared with the simulation results in [53] 
and experimental results in [54]. RBF-QR method is adopted 
as the interpolation approach, so that the MLGFIM becomes 
more robust for the analysis of periodic problems. From Fig. 
9(b), it can be seen that our results are closer to the 
experimental results compared with the simulation results in 
[53]. And we also find that the second resonance shifts toward 
a lower frequency band with the increase of the dimension d. 
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(c) 
Fig. 10. Plane wave scattering from a patch array. (a) Geometry. (b) 
Normalized E-field on xz plane. (c) CPU time and memory requirement. 


1E+006 


In the sixth example, MLGFIM is applied to analyze some 
patch arrays with planar multilayer substrate, in which 
sub-domain FFT is combined to accelerate the calculation. A 
plane wave with the electric field parallels to the y axis 
incidents from the normal direction. The parameters of the 
microstrip patch array are L = 49 mm, w = 41 mm, pa = 90 
mm, pp = 82 mm, as shown in Fig. 10(a). The thickness and 
dielectric constant of the substrate are 1.59 mm and 2.2, 
respectively. The working frequency is set as 2.2GHz, and the 
normalized electric fields on xz plane for 8x8, 16x16, 32x32, 
and 64x64 arrays are shown in Fig. 10(b). In Fig. 10(c), the 
efficiencies, using MLGFIM and MLGFIM-FFT, for 
analyzing these arrays with different number of unknowns are 
compared. For this example, 2-D interpolation method is 
applied, and hence the CPU time and memory requirement are 
both much smaller than those of the above examples which 
use 3-D interpolation. From Fig. 10(c), it is observed that the 
CPU time and memory requirement using MLGFIM-FFT are 
always smaller than those using MLGFIM alone. With the 


10 


Forum for Electromagnetic Research Methods and Application Technologies (FERMAT) 


increase of the array size, the reductions of time and memory 
become more significant. 
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Fig. 11. Radiation of a 32x32 parallel fed antenna array. (a) Geometry. (b) 
Unit. (c) Radiation pattern on H-plane (ø= 0°). (d) Radiation pattern on 
E-plane (¢= 90°). 


Finally, the MLGFIM-FFT method is used to calculate the 
radiation pattern from a 32x32 patch array with planar 
multilayer substrate. The structure of this array is shown in 
Fig. 11(a). And this structure is composed by eight 4x4 unit in 
one direction which is given in Fig. 11(b). The parameters of 
the microstrip patch array are l = 41 mm, w = 49 mm, Pa= 100 
mm, p= 100 mm, tı = 5.0 mm, t2= 1.5 mm, and the thickness 
and relative dielectric constant of the substrate are 1.59 mm 
and 2.2, respectively. The working frequency is 9.42 GHz and 
the number of the unknowns of the metallic layer is about 
310,000. The radiation patterns in both E- and H-plane are 
plotted in Fig. 11(c) and (d). A very narrow beam is formed as 
expected. The FFT will be applied at level 2 and 3. Because 
the structure is electrically large and complex, the 
convergence of the iteration process will be a big problem. To 
ensure convergence, the number of the spanning vector in 
GMRES is set to 1,000. The CPU time per iteration is about 
1.8 seconds which is still very efficient, and the memory 
requirement is 11 GB due to the large number of spanning 
vector. 


VII. CONCLUSIONS 


The development and improvement of the efficient fast 
algorithm, namely MLGFIM, is reviewed. Objects in 
free-space, periodic and multilayer structures are analyzed 
using this fast simulation algorithm. Various techniques, 
including: BE RBF interpolation functions, boundary 
clustered STG pattern, FFT technique and parallelization 
approaches, are used to enhance the efficiency of the 
MLGFIM. Seven examples are given to validate the accuracy, 
and demonstrate the improvement of the MLGFIM after using 
the above new techniques. 
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